Distillation of human–object interaction contexts for action recognition

نویسندگان

چکیده

Modeling spatial-temporal relations is imperative for recognizing human actions, especially when a interacting with objects, while multiple objects appear around the differently over time. Most existing action recognition models focus on learning overall visual cues of scene but disregard holistic view human–object relationships and interactions, that is, how interacts respect to short-term task completion long-term goal. We therefore argue improve by exploiting both local global contexts interactions (HOIs). In this paper, we propose Global-Local Interaction Distillation Network (GLIDN), object through space time via knowledge distillation HOI understanding. GLIDN encodes humans into graph nodes learns attention network. The context graphs learn relation between at frame level capturing their co-occurrence specific step. constructed based video-level identifying throughout video sequence. also investigate from these can be distilled counterparts improving recognition. Finally, evaluate our model conducting comprehensive experiments two datasets including Charades CAD-120. Our method outperforms baselines counterpart approaches.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Action and interaction as contexts for enriching representations

متن کامل

Manipulative Action Recognition for Human-Robot Interaction

Recently, human-robot interaction is receiving more and more interest in the robotics as well as in the computer vision research community. From the robotics perspective, robots that cooperate with humans are an interesting application field that is expected to have a high future market potential. A couple of global and also mid-sized companies have come up with quite sophisticated robotic plat...

متن کامل

Contexts for Human Action

We argue that the mathematics developed for the semantics of computer languages can be fruitfully applied to problems in human communication and action.

متن کامل

Informative joints based human action recognition using skeleton contexts

The launching of Microsoft Kinect with skeleton tracking technique opens up new potentials for skeleton based human action recognition. However, the 3D human skeletons, generated via skeleton tracking from the depth map sequences, are generally very noisy and unreliable. In this paper, we introduce a robust informative joints based human action recognition method. Inspired by the instinct of th...

متن کامل

Deep Alternative Neural Network: Exploring Contexts as Early as Possible for Action Recognition

Contexts are crucial for action recognition in video. Current methods often mine contexts after extracting hierarchical local features and focus on their high-order encodings. This paper instead explores contexts as early as possible and leverages their evolutions for action recognition. In particular, we introduce a novel architecture called deep alternative neural network (DANN) stacking alte...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computer Animation and Virtual Worlds

سال: 2022

ISSN: ['1546-427X', '1546-4261']

DOI: https://doi.org/10.1002/cav.2107